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1 . Introduction 


Given a system of linear equations 


Ax = b, 


where 
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the solution vector x can be found as 

x = A~ X b 


provided A is non-singular. Using direct methods the A~ l is usually decomposed as a product of 
elementary matrices which reduces the original A to an identity matrix. This is true whether the 
method selected is of the Gaussian Elimination type which triangularize A or the Gauss-Jordan type 
in which diagonalization of A takes place. In actual computation the decomposed A _1 is of course 
only an approximation to the exact one due to round-off errors incurred during the execution of 
the computational steps of the specific method chosen. Since there are many variations in this class 
of decomposition methods, one wonders whether one variation is ''better" than the other in terms 
of accuracy of the computed solution. 

In this paper we show, under mild restrictions on the implementation of some algorithms, 
that all decomposition methods are "equivalent" in terms of our error complexity measures. Some 
preliminary results are presented in Section 2. ITie classification and analysis of decomposition 
methods are given in Section 3. 

2. Some Preliminary Results 

Given a normalized floating-point system with a f-digit base /? mantissa, the following 
equations can be assumed to facilitate the error analysis of general arithmatic expressions using only 
or / operationsfl]: 

(2.1) fl{x#y) = (x#y) A, # e { 4- 
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where 


{ 4r B ] 1 for rounded operations 

2 _, 

Z? 1 1 for chopped operations 

and jc and y are given machine floating-point numbers and fl(.) is used to denote the computed 
floating-point result of the given argument. We shall call A the unit A -factor. 

In general one can apply (2.1) repeatedly to a sequence of arithmetic steps, and the computed 
result z can be expressed as 


*< 2 n ) 

M l 

where each z^ or z dj is an exact product of error-free data, and A* stands for the product of k pos- 
sibly different A-factors. We should emphasize that all common factors between the numerator and 
denominator should have been factored out before z can be expressed in its final ration;*! form of 
(2.2). Following [2], we shall henceforth call such an exact product of error-free data a basic term, 
or simply a term. Thus X(z n ) or A(z d ) is then the total number of such terms whose sum constitutes 
z„ or z d , respectively, and a(z m ) or o(z dj ) gives the possible number of round-off occurrences during 
the computational process. As an example, consider the solution of two linear equations in two 
variables as follows: 


ax ] -f bx 2 = c 


dx x +e * 2 =/ 


The exact solution x 2 can be found as 


*2 =- 


/- 


*C 


i* b 


The computed x 2 is the result of the following expression: 
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*2 =y7(W]/w 2 ), M'] =fl(f~f\)> w 2 ~fK e ~ c i). 
/, =« * c), e, =/7(rf, * b), =fl(dla). 
Applying (2.1) repeatedly to the above expression, wc have 

y A _ A A 3 ^ a / A - dcA 2 
a 3 <aeA — <iAA 3 

which is of the form of (2.2). We define the following two measures: 
maximum error complexity: 

(23) »(2„) mvgfo 

cumulative error complexity: 

X(r„) 

(2-4) j(z„)= j(z^)= ^a(z^). 

;=1 7=1 


Different algorithms used to compute the same z can then be compared using the above error 
complexity measures and the number of basic terms created by each algorithm. 


For convenience we will use z d and z„ to represent the 3-tuples {2(z rf ), o(z d ), s(z d )} and 
f/fzj, «t(z„), s(zj} , respectively, so that the computed z of (2.2) is fully characterized by 



In division-free computations any computed z will have only the numerator part z„. The 
following lemma is useful in dealing with intermediate computed results: 

Lemma 2. 1 Given x and y with their associated x n and JJ„ , 

(i) if z = xy, then 

z n = xJn =y n X n = °(x n ) + *(*^0/,) + 2(x„).r0„)} , 

(ii) i f z = x ±y, then 

^n = x n +y n =y n + x n = {/!(*„) + max(a(x„), a(y n )), s(x n ) + s(y n )} . 
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Proof. The results can be obtained easily by expressing x and y as 


Mx n ) A(y„) 

x =Yj X ™ A<T(Xm) ’ y = E^ A<T(>W) 

i=l 7=1 

and applying (2.3), (2.4) and the definition of to find z n . Q.E.D. 


71ie unit A-factor is then defined as 


(2.5a) 


A = {1,1,1}. 


One can obtain easily using Lemma 2. 1 to get 


(2-5 b) 


A' = 


For general floating-point computations, we have the following lemma: 


Lemma 2.2 Given x and y with their associated 

_ o(x n ), r(x„)} _ _ v n _ {Mynh ■*(>»)) 

A ~ X d - mx d ), <7{x d ), s(x d )} y y d ~ {A(y d ), a(y d ), s{y d )} 

(i) if z = fl(x + >’) and there is no common factors between x d and y d , then 

Z Zd X<$d 


where 

X{z n ) = l{x n )X(y d ) + X(y n )X(x d ), 
k(z d ) = Hx d )A(y d ), 

a(z n ) = 1 + max(<r(x„) + a(y d ), a(y n ) + <r(x d )), 

a(z d ) = <r(x d ) + o(y d ), 

s(z n ) = k(x n )s(y d ) + A(y d )s(x n ) + My n )s(x d ) + A(x d )s(y n ) + \{z n ), 
s(z d ) = *(x d )s(y d ) + X(y d )s{x d )\ 

(ii) if z = fl(x xy) and there is no common factors between x n and y d or between y„ and x d , then 

*n _ WrA 

Z z d xyy d 

where 

t(z n ) = X(x„)?.{y n ), 

Kz d ) = Hx d )X(y d ), 

a ( z n) = 1 "h &( x n) T a (yn)> 

a(z d ) = a(x d ) + a(y d ), 

s(z n ) = A(x„)s(y n ) + + Mz„), 

s(z d ) = X{x d )s(y d ) + t(y d )s(x d ); 

(iii) if z = fl{xjy) and there is no common factors between x n and y n or between x d and y d , then 
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where 


y n *d 


>.(z n ) = k{x n )k(y d ), 
k(z d ) = X[x d )k{y n ), 

°(z n ) - °( x n) + °b>d) + !. 

°(Zd) = °( x d) + °iy n )’ 

s{z n ) = M x n) s b>d) + A(yd) s ( x n) + *( z n)> 
s{z d ) = Hx d )s(y n ) + 

P roof. First we apply (2.1) to each case and obtain 

z = fl(x#y) = (x#y)A, # e { + , - , x , /}. 

The results can then be obtained easily by using (2.3) and (2.4). Q.E.D. 


Often one needs to add up an extended sum given by 


(2.6) 


k a k 

A V Zn 1 

z d z d i 


/=! 


The order of summation certainly has to be specified. One would like to select an order to mini- 
mize the incurred additional error complexities of the final computed z. We consider the following 
four strategies. 


If the items are added recursively in parallel by divide-and- conquer, then the strategy is called 
left-heavy if 


AAA 

z = z x + z 2 


where 


r*/2i k 

z \ = Yj *'■’ ^ = Yj *'■ 

'=• i= Tk/2] +] 


Similarly the strategy is called right-heavy if 


Lfe/2J k 

A A A A V"* A 

Z - z 3 + Z 4 , z 3 = 2 ^ X r Z 4 = 2 ^ ; 

/asl t= Lfc/2j +1 


If the items are summed up in sequential order, then we have the common strategics of lcft- 
to-right or right-to-lcft. We have the following useful lemma: 
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Lemma 2.3 Given (2.6) and it is desired to find 



where 


/=1 

2 k ~ 2 X(z nX ) = 2*- 2 Mz n2 ) = 2 k ~hCz n3 ) = ... = 2°?.(z nk ), 
o(z n j) + k - 1 < <7(z„ 2 ) + £ ~ 1 ^ + k.-2< ...< o{z nk ) + 1, 


then 


k 

(0 A(Z„) = A(z„) = regardless of the strategy chosen, 

/= 1 

| T log /cl if the strategy is right-heavy, 

(/,) + w where w = J LM U “ , the , s, f =» is . : 

" j & — 1 if the strategy is nght-to-left, 

I 1 if the strategy is lcft-to-right. 

^ A f(2* — 2)2(z nl ) if the strategy is left-to-right, 

(»0 J ( z n) - Zj s ( z m) + jj-j + ( 2*- 3)2* -2 ]2(z nl ) if the strategy is right-to-lcft, 

k k 

(/V) J\(z m .) + Llog AJ 2 k ~'X(2 ni ) < s(Z„) < £/&) + r log k] 2 k - ] A(z nl ) 

i= 1 (=1 

if the strategy is either left-heavy or right-heavy. 


Proof. The results for X(z n ) are obvious. For error complexities consider first the sequential 
strategics. If the strategy is left-to-right , then we can easily obtain 

z n = z nl& k 1 + Z n2 ^ k 1 +- + Z nk A } . 

Applying Lemma 2. 1 to the above equation, we obtain 

a(z n ) = max(ff(4i) + k - 1, <x(z„ 2 ) + k ~ 1 a (^nk) + 1) 

= a ( z nk) + 1 

by assumption. Also 

k 

s(z n ) = £s(zj + (k- 1 )?.(z nX ) + (k- l)X(z r2 ) + ... + (1 )X(z nk ) 

/= 1 
k 

= ^4 m .) + ( 2* -2)^,) 
i= 1 

by simplification. Hence the theorem is true. 
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If the strategy is right-to-lcft, then we have 


— +z nJc _ l A k 1 + z nk A k 1 

By repeated application of Lemma 2. 1 we have certainly 

o(z n ) = °(z n k) + k ~ 1 - 

Also 


k 

s(z„) = £s(zj + A(z ni ) + 2A(z nl ) + ... + (k- 1 mKjc-x) + A-)] 

/=! 


which can also be simplified to the desired form. Hence the theorem is true in this case. For the 
parallel strategies, the computed z n can be expressed as 

= z nl A Jl + z nl A h + ... + z nk A Jk 

where 

j x = riogjfel ^/ 2 ^ ^Jk = Llog^J if the strategy is left-heavy, 
j x = ilog kj <j 2 < ... < jjc = r log k] if the strategy is right -heavy, 

Now if the strategy is right-heavy, it is obvious that 

a{z n ) = a(z nk ) + r log k\ . 

If the strategy is left-heavy, then by assumption we have 

a(z n k ) + [Jog kj S a(z nl ) + Llog kj +k-i ^ a (z„ t ) 4 - T log *1 for i < k 

Hence 

a(z„) = a(z nk ) + [log kj . 

The cumulative error complexity results are obvious. This completes our proof. Q.E.D. 


Now from the results of Lemma 2.3 we see obviously that a(z n ) will be minimal if the strategy 
is left-to-right. For s(z n ) we see that 

2 k -2 < r log Arl 2 k ~ l < [1 + (2k - 3)2* -2 ], for k > 3 or k = 2. 

Furthermore for k = 3 then we have 


\z nl A 2 + z n 2 A 2 + z n3 A if the strategy is left-heavy, 
n ^ z nl A + z n2 A 2 + z n2 A 2 if the strategy is right-heavy. 


= 


Hence 



s{z n ) = s(z n] ) + s(z n2 ) + s(z n3 ) + 


{ 6 l(z n \) if the strategy is left-heavy, 
lk{z nX ) if the strategy is right-heavy, 
6 k(z n {) if the strategy is left-to-right, 
7A(z n] ) if the strategy is right-to-left. 


Thus the left-to-right strategy gives us, in all cases, the minimal cumulative error complexity also 
in the computed result. We conclude with the following theorem: 

'ITicorem 2.1 If (2.6) is to be computed and the conditions specified in Lemma 2.3 are sat- 
isfied, then one should choose the strategy left-to-right in order to minimize both the maximum 
and cumulative error complexity of the computed extended sum. 

3. Inverse Decomposition Methods 

To fmd the inverse decomposition, one applies a sequence of elementary trasformations to 
the matrix A so that zeroes are created in the lower and upper parts of the matrix. The final reduced 
matrix is of course the identity matrix so that an implicit A~ l can be obtained as a product of the 
sequence of elementary matrices which can then be applied to b to obtain the desired solution 
x — A~ l b . In general a large collection of methods can be classified as the usual triangular de- 
composition of A into a product of lower and upper triangular matrices followed by explicitly or 
implicitly inverting the calculated triangular matrices. To see this consider the diagonalization of 
an ;V x N matrix. We shall assume that pivoting is not necessary. The Gauss-Jordan method 
would create a sequence of matrices C u C 2 , ... and C N such that 

N 

C N ... C 2 C,A = D , Ci = I N ~ ^ c ji e j e J 

where / A and ej represent the N x N identity matrix and the transpose of the /-th column of the 
identity matrix , respectively. The matrix C t is chosen such that the transformed matrix 
C 4 C f _,...C 1 A will have all its off-diagonal elements of the first / columns zeroed. It is also obvious 
that the matrix C t depends on the previously generated Q_ 2 , C, and A . Now C t can be ex- 
pressed as 
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I 


AT i-l 

Cj — L t Vi = L' t L' t , ^ = / y “ c ji c j e i » ~ Av ~ ^'fji C j C i 

;=/+! y=i 


where f/, and L, represent, respectively, the upper and lower part of C,. Furthermore we can easily 
show by induction that 

C N ... C 2 C } = t/^ ... U 3 (J 2 L N _ l ... Z^Lj 


Hence 


= D , fT 1 = t/ 3 t/ 2 , ZT 1 = L N _ } ...LjL,. 

Since the steps in L~ l A describe the elimination stage of the Gaussian Elimination method in con- 
verting A to an upper triangular form and the subsequent U^ l (L l A) is the result of a forward 
elimination process to create zeroes in the upper triangular L~ l A by columns, the Gauss-Jordan 
method is therefore numerically equivalent to the elimination stage of Gaussian Elimination 
method followed by a forward-elimination process to diagonalize the intermediate upper triangular 
form. More precisely, the Gaussian Elimination method explicitly finds a matrix L whose inverse 
L~\ in implicit product form, is used to reduce the matrix A to an upper triangular matrix U such 
that 

A = LU , L~ ] A = U. 

The Gauss-Jordan method creates the same L and an explicit unit-diagonal upper triangular matrix 
M such that 


I ML~ l A = D. 

Henceforth we shall restrict our attention to the class of methods by which the matrix A is 
decomposed first into a product of lower and upper triangular matrix in one of the following two 

forms: 

(3.1a) A=LU 

i 

I 
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(3.1b) 


A = PR 


where L and R are unit-diagonal lower and upper triangular matrix, respectvely; and P and U are 
general lower and upper triangular matrix, respectively. Once the matrix A is decomposed, then 
L , U , P , and R can be explicitly or implicitly inverted so that A~ } can be obtained. 

Expanding the products in (3.1a) and (3.1b) explicitly, the decompositions can be obtained 
by the following recursive equations: 


(3.1al) 


k-l 

u kj =J 1 ( a kj - Yj )» 1 <k<j<N, 
/=1 
/-! 

Iji =fl{ ( Oji - 'Y J l jt u td /% ). 1 <i<j<N, 

t—1 


(3. 1 b 1) 


k-l 

Pjk a jk ~ YjPjftk)’ 1 <^< 7 <^. 
/= 1 
/-I 

a ij - ) * 1 N- 

t= 1 


The equations in (3. la 1) simply say that any u kj can be computed as soon as the £-th row of L and 
the first k-l elements of the y-th column of U arc available, and any l J{ can be computed as soon 
as the first / — 1 elements of the y-th row of L and the i -th column of U arc available. Equation 
(3.1bl) can be interpreted similarly. The only freedom left to an algorithm designer is the order in 
which the summations in (3. la 1 ) and (3. Ibl) should be excutcd. We shall call an order optimal if 
the computed result has the minimal maximum and cumulative error complexities. We have the 
following theorem: 

Theorem 3.1 Let the given matrix A be such that 

Sn=q. = {1,0,0}, {1,0,0}, 1,1). 


n 



ITic optimal order to compute the summations in (3.1al) and (3. lb 1) is the lcft-to-right strategy for 
their evaluation using the explicit expressions given by 


u kj ~fl( a kj ~ k\ u \j~ lk2%2J ~ — ~ ^kji—\ u k—\j )• 1 <k<j< N, 
Iji =J1( (<*ji - - lj2“2i - ••• - l jj-\ u i-U )K). 1 ^ ‘<j<H 


Pjk a jk~ Pj\ r \k~ Pj2 r lk - --Pj,k-\ r k-\Ji )> 1 ^ -' v - 

r ij=fl( a ij ~ Pi\ r \j ~ Pa r 2j ~ - - Pi,i-\ r i-lj ) - 1 <i<j<N 


and the computed u kj , l Jt , p jk , and r tJ satisfy 


u kk Pkk e,»C 2 » ... C A _j. ’ u kj Pjk c,*c 2 . ... c fe _,, ’ ^ + - 

h = = 1 <i<j<N 


where 


c *+i* = { ; -( c *+i*)> a ( c k+ 1»). J ( c *+i*)} = c k+ 1 = °( c k+ 1)’ J ( c Jt+i)) = C * C * A 13< 

A i3 =eA + A 3 , 1 < A: < A r — 1. 

Proof. See Appendix I. 


By Theorem 3.1 we see that all variations of the class of methods for decomposing A into 
LU or PR are equivalent among those methods of their respective class in terms of our complexity 
measures as long as the left-to-right strategy is adhered to in the evaluation of (3.1a2) or (3.1b2). 
Two variations each for the decompositon of A into forms given by (3.1a) and (3.1b) are listed 
below: 


Algorithm G {Gaussian elimination for A = LU } 

for /= 1 to ,V— 1 do 
for j — i -f 1 to N do 

L=ft(pJPi) 
for k = / 4- 1 to N do 
Ojk =fl( Pk ~ Iji x a, k ) 

for Ac = 1 to N do 
for j = k to N do 
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Algorithm DL { Doolittle method for A - LU } 

for j = 1 to N do 
for / = 1 to y — 1 do 

=/7( ( a Jt - X /„!/*, )/«,,) 

J 

for A =y to N do j 

H* =/7( 4* - 2U-* ) 


Algorithm G1 {Gaussian Elimination for A — PR) 

for i — 1 to N — 1 do 
for k = i -f 1 to A' do 

=y?( PJP, ) 

for j — i 4- 1 to ;V do 
a,* =fl{ a jk - a,, x r lk ) 
for A' = 1 to N do 
for j = A to .V do 

Pjk ” a jk 


Algorithm CR (Grout method for = PP } 

for A = 1 to /V do 

for j — 1 to A do 
y-i 

Pkj = flk °kj ~ YPkJmj ) 
for j = A + 1 to A^do 

A — 1 

r kj = Eft/.; )//>** ) 


We are now ready to find the inverses of L, (/, P, P either implicitly or explicitly. Given L , 
the product form of L _1 can he obtained without additional computation as 


iV 

(3.2a) = L N ]_ l c ... L l( } , L lc x = 7^— ^ IjfijpF * * ^ ^ N — 1> 

M+l 


i-i 

(3.2b) L d ] l - L N ]_ X r ... , Ly. X — I N - , 2 <i<N. 

7-1 


On the other hand, one can also obtain explicitly a lower unit-diagonal matrix M such that 
\/A - i s and 


N 

(3.2c) LJfc = M ]c ... > M k ^I N - , 1</<A'-1, 

y=/+i 
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(3.2(1) 


I'Mr -M 2r — M Hr » 


f— 1 

iV/^ = /^ — ^y y / n ij € i e j > 2 < / < A . 

y=i 

Note that (3.2a), (3.2b), (3.2c), and (3.2d) can be regarded as methods creating zeroes in the given 
L by column-wise forward , row-wise forward, column-wise backward, and row-wise backward 
elimination, respectively. 

Solving for ML = wc find that 


*-l 

“^( h-\~kj ~ Y nh+kj+A+jj )» 1 < & < A — 1 , 1 < / < A — 


In other words, m i+kti depends on those elements of the (/ + k) -th row of M to its right, as well as 
i and those above it in the /-th column of L. A proper algorithm is the following: 


Algorithm M 

for k = 1 to N — 1 do 
for / = 1 to N — k do 
j=i+k 

”*ji hi ~ m u-\h-u - “ m u+ \l+u ) 


We have the following theorem: 


Theorem 3.2a Given L whose error complexities satisfy Theorem 3.1, then the optimal or- 
der to find M using Algorithm M is again the left-to-right strategy for the summations. The com- 
puted M has error complexities given as 


k k 

m i+kj = ( 1 )^13 A/( )• 

y=! J= 1 


Proof. See Appendix II. 
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Since L - 1 is applied to A and L~ X A = £/, the same operations need to be applied to the right 
hand side vector A in solving Ax = b so that the reduced system Ux = /can be solved later where 
/ - One natural question is whether (3.2a) through (3. 2d) are equivalent in the sense that the 

computed / using any of them will have the same error complexities. For (3.2a) and (3.2b) this is 
true as we can simply augment A with b as its (N + l)-st column and a decomposition of A = LU 
can be performed with U being a trapezoidal matrix having /as its last column. For (3.2c) and 
(3. 2d) the results of Theorem 3.2a can be applied and we can easily obtain the following theorem 
by induction: 

Theorem 3.2b If the given right hand side vector b is such that b t = { 1,0,0}, then the com- 
puted 

f=J HL~'b) 

using any one of (3.2a) through (3. 2d) is such that 

fi = u iN , l <i< N 

provided the summations used to evaluate f in (3.2b) and (3. 2d) are carried out using the left-to- 
right strategy in the following expressions: 

(3.2b 1 ) f t =Jl{b i - lj x - - ... - ), \<i<N-\, 

(3. 2d I) fi — fl ( j bj_ | - m lJ _ 2 b l _ 2 - ... - ), 1 < / < N - 1. 

To find P~ l the situation is similar. Two different forms of P~ x can be found without addi- 
tional effort: 

(3.3a) Pc\=D- n 'P- n X _,' C ...D^P^DI\ 

(3.3b) PcR = DjP„' r .. .D^P^DT' 


where 
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D 


N i - 1 

\ ~ I N ~ e i e i hi e i e i > P ic ~ In ~ Pji e j e i * ^ir = I N ~ N Pij e i e j • 


J =‘+ 1 


y'=i 


Using an additional 0(N 2 ) divisions one can also obtain the following two forms easily: 


(3.3c) 


N 

Pcl ] =D-'pJ_ Uc ,...p; t }, pp) = /, v — 

j—i+1 


(-1 

(3.3d) Pc' R i = ... Pj; , = /„- 5>V/e/, 2 < / < N 

j= l 

where 

P’ji =fl{PjilPn ) . ° = dicig[p u , p NN l 

Note the above are forward type of algorithms. In (3.3a) and (3.3b) P is gradually reduced 
to an identity matrix, whereas in (3.3c) and (3.3d) it is reduced first to a diagonal matrix. If we 
prc-multiply P first by D~\ then the new matrix becomes a unit-diagonal lower triangular matrix 
and four new forms similar to (3.2a) through (3.2d) can be derived. By Theorem 3.2b we know that 
they arc equivalent among themselves. We shall only give one of them in detail: 

N 

. ^ , P 0\2 = P N-\ % c" - P \c D 1 » V ~ I N ~ X! P"ji e j e f ’ 

J =/+1 

P"ji=fl(PjilPjj)> \ <i<N- 1 . 

f inally P can also be reduced first to diagonal matrix by backward type of algorithms. We 

have 


N 

(3.30 = 1 < / < A' — 1 , 
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1-1 

P Q ] r = L) ^Qlr - Qsr > fir = V “ X W/’ 2 ^‘^ N 

y=l 


^i+W A ( ( A‘+W 4i+fc,/+/:— 1 A+Jk— 1 ,/ • • * #/+£,*+ lft+ 1 ,/ )/Av ) » 

1 <k<N- 1 , 1 <i<N-k. 


I lie following theorem can be proved using kmma 2.3 and induction: 


Theorem 3.3 


(3-3g) 


where 

(3.3h) 


(i) Given the matrix P whose error complexities satisfy Theorem 3.1, then the optimal order 
to find Q using (3.3h) is the left-to-right strategy for the summations. The computed Q has error 
complexities given as 


cfi+kj = Wi+kji ’ 1 < k < N — 1 , 1 <i<N — k. 

(ii) If the given right hand side vector b is such that b t = {1,0,0}, 1 < /< A r , then the com- 
puted 

g =fi{p- ] b) 

using any one of (3.3a) through (3.3g) is such that 

Sn — c n&I ftv* . ft — r lN , 1 < / < N - 1 

prov ided the summations used to find ft in (3.3b), (3.3d), (3.3g) are carried out , respectively, using 
the left -to-riglit strategy' in each of the following expressions: 

( 13hl ) — Pi\g\ - ... - /> />( -_,ft-_, )IPu), 1 < / < N, 

(3.3d 1 ) ft — fl {^i! Pa ) * h=JH b l-Pl\ h \ - • -Pi,i-A-\)' 1 < / <V , 
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(3.3gl) 


Si ( h - - ... - q n b , )lp u ) , 1 < /< N. 


Methods for finding and U~' are similar to those for finding L~> and P~\ respectively. 
We shall simply list them and give the resulting theorems without proofs. 


Methods for R~ l : 


(3.4a) 


/-I 

&B\ = ^2c — > &ic 1 = In ~ Y/M 1 2 < / < N, 

7=1 


(3.4b) 


N 


R B2 ~ R\r ^A— \,r » ^iV “ Av r ij e i e j » I ^ z ^ N 1, 

7=/+l 


/— 1 


(3.4c) Rjl =M' Nc „.M’ 1c , M' ic = I N -Y J m 'ji e d’ 2 < / < N , 


7=1 


(3.4d) 


A r 


M' ir = I N - Y i rf u e ( eJ', \<i<N-l 

7=^1 


where A/' is a unit-diagonal upper triangular matrix computed by the following formula: 


(3.4c) 


m i,i+k fl ( r i,i+k m ij+ 1 ^i+ 1 ,/+/e * * * ^ ij+k— 1 r i+k— 1 ,/+/c ) » 

1 <k<N- 1 , 1 </< W-A. 


Theorem 3.4 

(i) Given R whose error complexities satisfy Theorem 3.1, then the optimal order to evaluate 
(3.4c) is the left-to-right strategy. The computed XV is such that 

m’ij+k = ™i+kj f 1 < k < N — 1 , 1 <i<N — k. 


(ii) given g whose error complexities satisfy Theorem 3.3, then the computed 
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using any one of (3.4a) through (3.4d) is such that 


x i 


c i c i+\ ••• C A ! -rN-tr 

7-7 7 — a 13 a 

c i* c i+ 1* — C N* 1J 


1 < i < N 


provided the summations used to evaluate jc, in (3.4b) and (3.4d) are carried out in optimal order 
of the left-to-right strategy , respectively, using the following expressions: 

(3.4b 1) x N = g N , x i = fl (gj - r lN x N - ... - r u+l x i+l ), 1 < / < A' - 1, 

(3.4*1 1 ) x, =JI(gi - m' iJ+] g i+l - ... - m' iN g N ) , \<i<N. 

Methods for U~ x : 


(3.5a) 


(3.5b) 

^ = &u\ U\ r ■■■ Du)n- 1 ^N-\,r^UN 

where 



/-] 


T T 

D(ji — I M — cfii + , 


Ufc ~ i n Iw 
y=i 




- V 

y=<+i 


Un} = D7,'U^ ...U 






in 


V l 2 <f 


'Nd 


i—l 

Vu-h-Yfltrf 

j= 1 


2<i<N, 


:V 

(3.5d) = = f ~ 1 . 

;='+i 


where 


tiji = fl ( uy/ujj ), D a = diag [u u , , w AW ]. 


/-l 

ji C j e i J u ji" fl { u jil u jj ) > 2 < / < A . 

y=! 


/-I 

( 3.50 Ur l ' = Du'Q' Nc ...Q' 2c , Q k = I N - Y/j&T > 2<i<N, 

j= i 


N 

(3.5g) = Z>^ 0Vi , ... C'lr . 0V = /yv - E 1 

y=*+i 


where is a unit-diagonal upper triangular matrix computed by the following formula: 
(3.5h) q j.^ =y?(( ~ <7 /,/+ i w /+i > / 4 -^ ~ ~ <7 /,/+£- l w /+/r-l,/-f-fc )/ w i+/r,/+£ ) * 


Theorem 3.5 


(i) Given the matrix U whose error complexities satisfy Theorem 3.1, then the optimal order 
to find Q ' using (3.5h) is the left-to-right strategy for the summations. The computed Q* has error 
complexities given as 


(3.x:) U B l - U 2 c” ... Ufo’D u l , L J iJ - 


i~\~k — 1 

4' i,i+k ~ c i*( [] Cj )&^'&lc i+k . , 1 <k<N— 1 , 1 <i<N-k. 
j—i 

(ii) Given / whose error complexities satisfy llieorem 3.2b, then the computed 

x'=j nir'f) 

using any one of (3.5a) through (3.5g) is such that 

x'j - x t , 1 < / < yV 
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provided the summations used to evaluate x\ in (3.5b), (3.5d), (3.5g) are carried out, respectively, 
using the left-to-right strategy in each of the following expressions: 


0.5b 1) x’t u iN x' N - ... - w v+ ,*'j +1 )/U|j ) . 1 <i<N, 

(3.5d 1) x'i =fU}'jlu u ) ,y t =fl(fi - i/ iN y N - ... - Uy+iJVH ) » 1 - ‘ - N > 

(3.5gl) y, , - ... - )lua ),\<i<N. 

By r rhcorem 3.5 wc conclude that the class of inverse decomposition methods based on 
finding the trhingular factors of the matrix A followed by explicitly or implicitly inverting the tri- 
angular factors are equivalent among themselves in terms of our error complexity measures. 
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Appendix I. Proof of Theorem 3.1 


We prove by induction on k and /. For <: = /= 1 we have 

U\j = a,j, 1 <j ^ A r ; l, I =fl(a t \l u \ i ) = a n A l a \ 1 » 2<1<N. 


Hence 


u \ i — c \* > u \j — c \ » 2 <j < N y 
4l ~ a t]^l a \ 1 “ C \^i C \* i 2< t < ;V 

and the theorem is true. Assume the theorem is true for & = / ■= r — 1. For k~i—r we assume 
that in the computation of the given 4, and for 1 < f < £ — 1 are calculated by the optimal 
order and so u kl can be considered as the computed result of 

z = y l +y 2 + ... +y k , y x =a kj , y t+x =J7(l kl u ti ) , l<f<*-l. 

Now by assumption 


>’i ~ c i * 4 / — c Al c t* > 


“// = ■ 


: l* c 2* 




I k nee 


- C f r / A2 

- 1 '! c l ’ J'r+l C]*C2* ... c,. 


]</<&— 1 . 


Now if exact additions were possible, then 


/=! 


where 




= c 


P c 2* 




A-P 


£ n\ 


= C 


1 C P C 2* — C A-P » 2 *,r+l “ c / c / c r+P c /+2* C A-P^ , \ <t<k — 1. 
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By definition we have 


**/+!* C /*V^13 c r c /{2,3»4}. 


Hence 

2(c,+ 1 *) = M c t+\) = 2l 2 (c t ) - <r(c t +i») = <?(C[+i) = 2 o(c t ) + 3 > o(c t ) + 1. 

Applying the above equations to z %ti , l and z nt we have 

Mfn 2) _ ^nj+O K't) . 

A(Z„,) ’ /Or,..,) A 2 (C/ _ i} -> 

ff ( z /!,/+l) “ = <*(<*) “ 2o(c i _ ,) = 3 > 1. 


The above equations satisfy the conditions of h'mma 2.3, hence by the lemma the best strategy in 
evaluating 


u kj = fl(y\ + /2 + ••• +J , /c) 

is by the left-to-right strategy. Thus by Lximma 2.3 and the induction assumption we have 

“kj = (y\A k 1 *+ — + ) + J’/fA 

_ c k -l c k-l& c k 

~~ c \* c 2* c k~2* c \* c 2*~ c k-\* ~ c \* c 2* •• - c k-\* 

Similar reasoning can be used to show the remaining part of the theorem for lj s. This completes 
our proof. Q.E.D. 


! 
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Appendix I. Proof of Theorem 3.1 


We prove by induction on k and i. I : or k = i — 1 we have 

u\j — a t j , 1 <j < N; /„ =fl(a n ju\ , ) = a n A/a, , , 2<t< N. 

Hence 

u u = Cj* , z7 iy = c x , 2 <j<N, 


/ ;1 — j — fj A/cj* , 2< t < .V 


and the theorem is true. Assume the theorem is true for & = / - r — 1. For k = i = r we assume 
that in the computation of the given 4, and for 1 < f < A: — 1 are calculated by the optimal 
order and so u k , can be considered as the computed result of 

z=y i +^2 + - +yk • y\ = % . y t +\ ) - i < < < * - i. 

Now by assumption 


y l — c i > — c A! c t* ’ 


u v = ■ 


: l* c 2* 




I lence 


— _ _ y, a 2 

>i c i > >V+i c,*c 2 * ... Cj* 


1 < / < A' - 1 . 


Now if exact additions were possible, then 


/=! 


where 


2 d~ c \' c 2* ••• Z *1 - c l c l* c 2* ••• c £-l* > z nj+\ 


c t c t c t+\* c t+ 2* 


L k - 1 


*A Z 


1 < t < k ~ 1. 
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By definition we have 


C t+X . = c t+1 = c,c, Aj 3 = C,C,{2,3,4}. 


Hence 

/(c f+1 .) = 2 (c, +1 ) = 22 2 (C/) , <*(< 7 + 1 *) = <K< 7 +i) = 2ct ( c /) + 3 > ff ( c r) + 1. 


Applying the above equations to z„, +l and we have 

2-( z w 2) _ , ^ z n.r-n) H c t) _ 9 

2 ( 2 „i) ’ >.(z Rt ) X 2 {c t _ x ) 

a ( z n,i+ 1 ) - a ( z ni) = ^(Cj) - 2ff(c f _]) = 3 > 1. 

'The above equations satisfy the conditions of Lemma 2.3, hence by the lemma the best strategy in 
evaluating 


u kj=fl(y\ +J2 + - +J’k) 


is by the left-to-right strategy. Thus by Lemma 2.3 and the induction assumption we have 
u k j ~ tVi A* ] +y 2 A k 1 + ... + y k _\A 2 ) + y k A 

_ 1 A c k--\ c k- 1 A = c k 

c \* c 2* ... C k _2* C\*C 2 * ... c k-\* c \* c 2* 

Similar reasoning can be used to show the remaining part of the theorem for / Jt 's. This completes 
our proof. Q.E.D. 
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